Convolutional Neural Networks for Authorship Attribution of Short Texts

نویسندگان

  • Thamar Solorio
  • Paolo Rosso
  • Manuel Montes-y-Gómez
  • Prasha Shrestha
  • Sebastián Sierra
  • Fabio A. González
چکیده

We present a model to perform authorship attribution of tweets using Convolutional Neural Networks (CNNs) over character n-grams. We also present a strategy that improves model interpretability by estimating the importance of input text fragments in the predicted classification. The experimental evaluation shows that text CNNs perform competitively and are able to outperform previous methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship Attribution with Convolutional Neural Networks and POS-Eliding

We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict...

متن کامل

Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution

Convolutional neural networks (CNNs) have demonstrated superior capability for extracting information from raw signals in computer vision. Recently, characterlevel and multi-channel CNNs have exhibited excellent performance for sentence classification tasks. We apply CNNs to large-scale authorship attribution, which aims to determine an unknown text’s author among many candidate authors, motiva...

متن کامل

Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks

Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural ...

متن کامل

Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts

Sentiment analysis of short texts is challenging because of the limited contextual information they usually contain. In recent years, deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been applied to text sentiment analysis with comparatively remarkable results. In this paper, we describe a jointed CNN and RNN architecture, taking advant...

متن کامل

PREPRINT VERSION An agent-driven semantical identifier using radial basis neural networks and reinforcement learning

Due to the huge availability of documents in digital form, and the deception possibility raise bound to the essence of digital documents and the way they are spread, the authorship attribution problem has constantly increased its relevance. Nowadays, authorship attribution, for both information retrieval and analysis, has gained great importance in the context of security, trust and copyright p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017